Improving supervised learning performance by using fuzzy clustering method to select training data
نویسندگان
چکیده
The crucial issue in many classification applications is how to achieve the best possible classifier with a limited number of labeled data for training. Training data selection is one method which addresses this issue by selecting the most informative data for training. In this work, we propose three data selection mechanisms based on fuzzy clustering method: center-based selection, border-based selection and hybrid selection. Center-based selection selects the samples with high degree of membership in each cluster as training data. Border-based selection selects the samples around the border between clusters. Hybrid selection is the combination of center-based selection and border-based selection. Compared with existing work, our methods do not require much computational effort. Moreover, they are independent with respect to the supervised learning algorithms and initial labeled data. We use fuzzy c-means to implement our data selection mechanisms. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that, compared with random selection, hybrid selection can effectively enhance the learning performance in all the data sets, center-based selection shows better performance in certain data sets, border-based selection does not show significant improvement.
منابع مشابه
Data Selection Based on Fuzzy Clustering
When the number of training data is limited, the performance of supervised learning could be improved if valuable samples are selected for training. In this work, we propose a novel data selection method based on fuzzy clustering. Our method first partitions all the data which need to be classified into clusters. Then training data are selected from each cluster based on their membership degree...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملINTEGRATED ADAPTIVE FUZZY CLUSTERING (IAFC) NEURAL NETWORKS USING FUZZY LEARNING RULES
The proposed IAFC neural networks have both stability and plasticity because theyuse a control structure similar to that of the ART-1(Adaptive Resonance Theory) neural network.The unsupervised IAFC neural network is the unsupervised neural network which uses the fuzzyleaky learning rule. This fuzzy leaky learning rule controls the updating amounts by fuzzymembership values. The supervised IAFC ...
متن کاملText Categorization using the Semi-Supervised Fuzzy c-Means Algorithm
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. For the past few years, TC has become very important essentially in the Information Retrieval area, where information needs have tremendously increased with the rapid growth of textual information sources such as the Internet. In this paper, we compare , for text categoriz...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Intelligent and Fuzzy Systems
دوره 19 شماره
صفحات -
تاریخ انتشار 2008